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Abstract 

An algorithm counting the number of ones in a binary word is pre¬ 
sented running in time 0(loglog6) where b is the number of ones. The 
operations available include bit-wise logical operations and multiplication. 


1 Introduction 

The operation of counting the number of ones in a binary word consisting of n 
bits has received considerable attention in quite different fields such as Crypto¬ 
graphy 0 and Chess Programming [1], where the operation is used to evaluate 
the legal moves a player has in a given position. Counting ones is also known 
under the names sideways addition [J], bit count [5], or population count [I]. 

An early reference describing a non-trivial method for counting ones is the 
article by Wegner [5] : Instead of looping through all n bits of a machine word, 
the right-most one of an operand a: > 0 is repeatedly deleted by the operation 
X and (x — 1), where “and” denotes a bit-wise operation. In this way complexity 
0{vx) is achieved, where vx is the number of ones in the input x. This technique 
is also suggested in Exercise 2-9 of [5]. 

By forming growing blocks of bits, complexity O(logn) can be achieved with 
the help of constant time shift operations [1]. Under unit cost measure for mul¬ 
tiplication or division, algorithms of asymptotical time complexity O(loglogn) 
are the Gillies-Miller method [311] and Item 169 of |2]. 

In contrast to Wegner’s approach, the asymptotically more efficient solutions 
are oblivious in the sense that their complexity is independent of the input value. 
A sparse input (containing few ones) is not processed more efficiently than an 
input with many ones. If, e.g., the input is known to contain at most a constant 
number of ones, then Wegner’s method has time complexity 0(1). 

In this note we show that the Gillies-Miller method can be modified to work 
in a non-oblivious way. 
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2 Result 


Theorem 1 Counting ones can be done in 0(loglog6) steps under unit cost 
measure for logical and arithmetical operations including multiplication, where 
b = vx is the number of ones in the input x. 

Proof: We describe an algorithm that uses several families of constants (’’magic 
masks” in the sense of a) that potentially extend infinitely towards higher 
order bits. In concrete implementations these constants can be truncated to the 
word length of the processor architecture and only a finite number of values is 
required. 

The first family m of masks selects blocks of bits depending on parameter fc: 


m[k] 


■11---1100---0011---1I 

2k 2 '= 2 *“ 


The second family h selects the most significant bits from each block: 


h[k] = •••10---0010---0010---00 

2k 2>= 2 '“ 

The masks h are also used in a modified form as multipliers for adding up blocks 
of bits of the current value of x. 

Finally we make use of the table e [k] with e [k] = 2^. 

Each iteration of the while-loop in code that follows starts with x consisting 
of a sequence of blocks of length i = 2^, where each block contains the number 
of ones of the input in the corresponding bit positions. Variable p holds the 
product of X and 2h[k 1] + 1. The test concerning p and h[k\ determines if 
^ — l bits suffice to hold the count of all ones in the input. In fact, the test could 
be a little less strict with respect to the most significant bit, which may be a I 
in the block containing the count of all blocks. 


function bitcountCx: integer): integer; 
var k, p: integer; 
begin 
k := 0; 

P ;= -x; 

while (p and h[k]) <> 0 do 
begin 

X := (x and m[k]) + ((x div e[k+l]) and m[k]); 
p := X * (2*h[k+l] + 1); 
k := k+1 
end; 

bitcount := (p div e[(n div 2) - e[k]]) and (e[e[k]]-l) 
end; 


We now argue that the above algorithm determines whether an overflow to 
the most significant bit of the blocks being added up occurs. Let £ = 2^ > I be 
a block-length and consider the current a; as a sequence of blocks Xn/i ■ ■ ■ xi of 
length £. We claim that if then there is a. j <n/(. such that 2^“^ < 

<2^ — 1. If a;i > 2^~^ then we can take j = I since i < 2^~^ <2^ — 1 
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for i > 1. Otherwise there is a maximum s such that X]i=i <2^ ^ — 1. Then 
^ 2^“^ — 1 + = 2^ — 1 and we can set j = s + 1. Now J2i=i bit 

£ set such that “p and h[k]” will not be 0. 

When the loop terminates, the Gillies-Miller method will have produced the 
sum of all blocks in the “middle” block of p [7] . This block is extracted by the 
last assignment. 

Since the loop terminates with the smallest k such that = 2^ ~^ > b 
we have A: = 0(loglog6). □ 
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